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TI Predicting functions from protein sequences— where are 
the bottlenecks?. 

AB . . . of sequence data does not necessarily lead to an increase in 
knowledge about the functions of genes and their products. 
Prediction of function using comparative sequence 
analysis is extremely powerful but, if not performed appropriately, may 
also lead to the creation and propagation of assignment errors. 
While current homology detection methods can cope with the data flow, the 
identification, verification and annotation of functional features need. 
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TI Homology-based gene structure prediction: Simplified matching 
algorithm using a translated codon (tron) and improved accuracy by 
allowing for long gaps. 

AB Motivation: Locating protein-coding exons (CDSs) on a eukaryotic 
genomic DNA sequence is the initial and an essential step in 
predicting the functions of the genes embedded in that part of the 
genome. Accurate prediction of CDSs may be achieved by directly 
matching the DNA sequence with a known protein sequence or 
profile of a homologous family member(s). Results: A new convention for 
encoding a DNA sequence into a series. . . this type of analysis. Using 
this convention, a dynamic programming algorithm was developed to align a 
DNA sequence and a protein sequence or profile so that the 
spliced and translated sequence optimally matches the reference the same 
as the standard protein sequence alignment allowing for long 
gaps. The objective function also takes account of frameshift 
errors, coding potentials, and translational initiation, 
termination and splicing signals. This method was tested on Caenorhabditis 
elegans genes of known structures. The accuracy of prediction 
measured in terms of a correlation coefficient (CC) was about 95% at the 
nucleotide level for the 288 genes tested,. . . and closest homologue 



